Understanding Tightly Coupled and Loosely Coupled Systems
A detailed exploration of parallel computer architectures based on their structural design
Flynn's taxonomy focuses on the behavioral aspects of parallel computers and does not consider their structural design. However, parallel computers can also be classified based on their architecture.
Systems with multiple processors working together
Links processors and memory modules
Based on how processors and memory are organized
Shared memory systems
Distributed memory systems
Flynn's classification focuses on the behavioral aspects of parallel computers:
Single Instruction, Single Data
Single Instruction, Multiple Data
Multiple Instruction, Single Data
Multiple Instruction, Multiple Data
Structural classification, on the other hand, focuses on the physical organization of the system:
Processors share global memory
Each processor has local memory
A parallel computer (MIMD) consists of multiple processors and shared memory modules or local memories connected via an interconnection network.
In tightly coupled systems, multiple processors communicate through a shared global memory. This organization is called a shared memory computer or tightly coupled system.
Every processor communicates through a shared global memory
Preferable for high-speed real-time processing
Processors are closely connected through shared memory
In tightly coupled system organization, multiple processors share a global main memory, which may have many modules. The processors also have access to I/O devices.
The inter-communication between processors, memory, and other devices are implemented through various interconnection networks.
Processor-Memory Interconnection Network
Input-Output-Processor Interconnection Network
Interrupt Signal Interconnection Network
This switch links up various processors with different memory units.
Connecting each processor directly to each memory module
Can become very complex with many processors and memories
Used instead of complex crossbar switches
Handles clashes when processors access same memory modules
This interconnection network is used for communication between processors and input/output (I/O) channels.
Processors need permission from IOPIN to interact with I/O devices
Manages which processor can access which I/O channel
When one processor desires to interrupt another processor, the interruption first travels to the ISIN.
ISIN passes the interruption to the destination processor
Allows ISIN to synchronize processors by facilitating interruptions
If a processor fails, ISIN broadcasts a message to other processors
Acts as intermediary for interruptions between processors
Coordinates and relays interruptions
Notifies all processors of any processor malfunction
Since every reference to the memory in tightly coupled systems is via interconnection network, there is a delay in executing the instructions. To reduce this delay, every processor may use cache memory for the frequent references made by the processor.
Cache memory is faster than main memory
Frequent accesses are handled locally
Overall system performance is enhanced
The shared memory multiprocessor systems can further be divided into three modes which are based on the manner in which shared memory is accessed.
Uniform Memory Access
Non-Uniform Memory Access
Cache-Only Memory Access
| Mode | Memory Access | Key Characteristic |
|---|---|---|
| UMA | Uniform for all processors | All processors have equal access time |
| NUMA | Non-uniform | Local memory access is faster than remote |
| COMA | Non-uniform | Uses cache memories instead of local memories |
In this model, main memory is uniformly shared by all processors in multiprocessor systems and each processor has equal access time to shared memory.
All processors have same access time to memory
Used for time-sharing applications
Memory is uniformly shared by all processors
Symmetric Multiprocessors (SMPs) are common examples of UMA systems where all processors have equal access to all memory locations.
In shared memory multiprocessor systems, local memories can be connected with every processor. The collections of all local memories form the global memory being shared. In this way, global memory is distributed to all the processors.
Uniform and fast for its corresponding processor
Slower and non-uniform, depends on location
In NUMA, all memory words are not accessed uniformly. The access time depends on the location of the memory relative to the processor.
Modern server systems with multiple CPU sockets, where each CPU has its own local memory but can access the memory of other CPUs through an interconnection network.
In NUMA model, if we use cache memories instead of local memories, then it becomes COMA model. The collection of cache memories forms a global memory space.
Cache memories act as the main memory
Remote cache access is also non-uniform
Collection of all caches forms global memory
COMA can provide better performance for applications with high data locality, as data can be moved to where it is needed.
In loosely coupled systems, processors do not share global memory as shared memory leads to memory conflict issues, which slow down instruction execution.
Each processor has a large local memory that is not shared with other processors. These systems have multiple processors with their own local memory and I/O devices, forming individual computer systems.
They are connected via a message passing interconnection network through which processes communicate by exchanging messages.
Each node has separate memory
Little interdependence between nodes
Since local memories can only be accessed by their attached processor, no processor is able to access remote memory. For this reason, these systems are also referred to as no-remote memory access (NORMA) systems.
Processors can only access their own local memory
Cannot directly access memory of other processors
Communication between processors is achieved through message passing rather than shared memory access.
Multiple processors share a common memory and I/O system (e.g., multi-core processors in desktop computers)
All processors have equal access time to memory (e.g., early Sun Enterprise servers)
Modern server systems with multiple CPU sockets (e.g., AMD EPYC, Intel Xeon servers)
Collection of computers connected via a network (e.g., Beowulf clusters)
Distributed computing across multiple administrative domains (e.g., CERN's Large Hadron Collider computing grid)
Systems where components located on different networked computers communicate by passing messages (e.g., Google's search infrastructure)
| System Type | Best Use Case | Real-world Example |
|---|---|---|
| Tightly Coupled (UMA) | General-purpose computing, shared memory applications | Desktop/workstation with multi-core CPU |
| Tightly Coupled (NUMA) | High-performance computing, large database servers | Enterprise server with multiple CPU sockets |
| Loosely Coupled | Highly scalable applications, fault-tolerant systems | Supercomputer clusters, cloud computing infrastructure |
Processors share global memory, faster communication but potential conflicts
Each processor has local memory, no conflicts but communication overhead
Crucial for performance in both architectures
While Flynn's taxonomy classifies parallel computers based on instruction and data streams, structural classification focuses on the physical organization of processors and memory. Both are important for understanding parallel computer architectures.
Modern systems often combine elements of both tightly and loosely coupled architectures, creating hybrid systems that leverage the advantages of each approach. For example, a cluster of NUMA systems forms a loosely coupled system with each node being a tightly coupled system.
The choice between tightly coupled and loosely coupled systems depends on the specific requirements of the application, including performance needs, scalability requirements, and programming complexity. Understanding the structural classification helps in designing and selecting the appropriate parallel computing architecture for a given problem.